Skip to content

Conversation

@mgorny
Copy link
Contributor

@mgorny mgorny commented Oct 23, 2025

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #430. Added py3.14 migration too, let's see how it goes.

mgorny and others added 3 commits October 23, 2025 14:38
Signed-off-by: Michał Górny <[email protected]>
Fixes conda-forge#420

Signed-off-by: Michał Górny <[email protected]>
…5.10.23.10.43.38

Other tools:
- conda-build 25.9.0
- rattler-build 0.48.1
- rattler-build-conda-compat 1.4.9
@h-vetinari
Copy link
Member

let's see how it goes.

During debugging, can you please reduce the number of builds to ~one python version? Or maybe one+3.14

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Oct 23, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/18886564008. Examine the logs at this URL for more detail.

@mgorny
Copy link
Contributor Author

mgorny commented Oct 23, 2025

let's see how it goes.

During debugging, can you please reduce the number of builds to ~one python version? Or maybe one+3.14

Sure. Could you stop the Azure builds then?

@mgorny mgorny closed this Oct 23, 2025
@mgorny mgorny reopened this Oct 23, 2025
@h-vetinari
Copy link
Member

Sure. Could you stop the Azure builds then?

Azure is much less critical for resource scarcity, but if you want sure. Easiest is to just push a skip though

@mgorny
Copy link
Contributor Author

mgorny commented Oct 23, 2025

I'll also strip down CUDA targets.

Signed-off-by: Michał Górny <[email protected]>
…5.10.23.10.43.38

Other tools:
- conda-build 25.9.0
- rattler-build 0.48.1
- rattler-build-conda-compat 1.4.9

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Oct 23, 2025

(mumbling about platform-conditional patching)

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Oct 23, 2025

I have an updated patch ready, but will wait for other jobs to finish.

Signed-off-by: Michał Górny <[email protected]>
…5.10.23.15.25.29

Other tools:
- conda-build 25.9.0
- rattler-build 0.48.1
- rattler-build-conda-compat 1.4.9

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Oct 24, 2025

Looks like it's not ready for py3.14 yet. I'm going to look for another solutions for triton later today.

@mgorny
Copy link
Contributor Author

mgorny commented Oct 24, 2025

Okay, so:

  • linux-64 jobs are failing over being unable to find pybind11 headers
  • linux-aarch64 jobs are failing over some computation assert (help?)
  • win-64 jobs are failing over asmjit.dll install — I must have failed at updating the patch

@h-vetinari
Copy link
Member

I think the conclusion from #413 was that we'll have to add in the pybind migration (c.f. #415) for v2.9

regro-cf-autotick-bot and others added 2 commits October 25, 2025 20:36
…5.10.24.23.37.09

Other tools:
- conda-build 25.9.0
- rattler-build 0.48.1
- rattler-build-conda-compat 1.4.9

Signed-off-by: Michał Górny <[email protected]>
@mgorny mgorny mentioned this pull request Oct 25, 2025
3 tasks
@mgorny
Copy link
Contributor Author

mgorny commented Oct 26, 2025

I suppose we're making some progress:

  • CUDA builds now fail because they can't find cuda.h — need to check CUDA path patches
  • non-CUDA CMake test fails over being unable to find libfmt — looks like a missing dependency, though curious why it didn't fail before (diff suggests it was already present)
  • AArch64 still fails over the assert: the value is apparently tensor(0.+0.j, dtype=torch.complex128), while we expected .real == 4.0
  • Windows patch still needs fixing to install asmjit.dll; but looking at the log, I'm not sure if it's even being created now

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

AArch64 still fails over the assert: the value is apparently tensor(0.+0.j, dtype=torch.complex128), while we expected .real == 4.0

I'm going to try if setting PYTORCH_BLAS_USE_CBLAS_DOT=1 helps. pytorch/pytorch@8f0998a might be related.

non-CUDA CMake test fails over being unable to find libfmt — looks like a missing dependency, though curious why it didn't fail before (diff suggests it was already present)

pytorch/pytorch#164139 so an upstream regression, I guess.

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

Windows patch still needs fixing to install asmjit.dll; but looking at the log, I'm not sure if it's even being created now

Yep, looks like it isn't being built at all anymore, on Windows (on Linux it seems to be). So now the question is: is this an upstream change, or a regression on our end? And if the former, do we want to follow or change it back?

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

2025-10-25T18:52:27.0109528Z   CMake Warning at CMakeLists.txt:841 (message):
2025-10-25T18:52:27.0111736Z     x64 operating system is required for FBGEMM.  Not compiling with FBGEMM.
2025-10-25T18:52:27.0113700Z     Turn this warning off by USE_FBGEMM=OFF.

The code checks for x86_64, Windows uses AMD64

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

CUDA builds now fail because they can't find cuda.h — need to check CUDA path patches

So the command used in tests is:

2025-10-25T23:05:32.7696053Z $PREFIX/bin/x86_64-conda-linux-gnu-c++ /tmp/tmp8qzb4wz3/header.hpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX2 -D USE_CUDA -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -I$PREFIX/include/python3.12 -I$PREFIX/include -I$PREFIX/lib/python3.12/site-packages/torch/include -I$PREFIX/include/torch/csrc/api/include -I$PREFIX/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx2 -mfma -mf16c -E -P -o /tmp/tmp8qzb4wz3/header.i

which is missing the targets/x86-64 directory.

+ python_include_dirs
+ torch_include_dirs
+ omp_include_dir_paths
+ + [sys.prefix + '/include']
+ + [sys.prefix + '/targets/@CUDA_TARGET@/include']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I'm aware that the path is left here with literal @CUDA_TARGET@ on Windows/non-CUDA builds. I've figured out it won't harm and should keep the patch simpler.

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

Uh, so it looks like FBGEMM fails to build on Windows now?

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

Looks like neither AArch64 fix helped with the assert, not the libfmt patch helped with installing the headers. Maybe at least CUDA path patch helped…

@mgorny
Copy link
Contributor Author

mgorny commented Oct 27, 2025

Okay, that libfmt error is kinda our problem. pytorch installs the headers all right:

lib/python3.12/site-packages/torch/include/fmt/args.h
lib/python3.12/site-packages/torch/include/fmt/base.h
lib/python3.12/site-packages/torch/include/fmt/chrono.h
lib/python3.12/site-packages/torch/include/fmt/color.h
lib/python3.12/site-packages/torch/include/fmt/compile.h
lib/python3.12/site-packages/torch/include/fmt/core.h
lib/python3.12/site-packages/torch/include/fmt/format.h
lib/python3.12/site-packages/torch/include/fmt/format-inl.h
lib/python3.12/site-packages/torch/include/fmt/os.h
lib/python3.12/site-packages/torch/include/fmt/ostream.h
lib/python3.12/site-packages/torch/include/fmt/printf.h
lib/python3.12/site-packages/torch/include/fmt/ranges.h
lib/python3.12/site-packages/torch/include/fmt/std.h
lib/python3.12/site-packages/torch/include/fmt/xchar.h

but we copy some of the headers into the top include directory, and expect that to work. Except that PyTorch's headers expect <fmt/format.h> include to work directly, and we can't install that directly into include. So I suppose the options are to either:

  1. Depend on system fmt and hope it won't cause compatibility issues.
  2. Install fmt headers to a subdirectory, and patch PyTorch's includes to use that.

@h-vetinari
Copy link
Member

So I suppose the options are to either:

  1. Depend on system fmt and hope it won't cause compatibility issues.
  2. Install fmt headers to a subdirectory, and patch PyTorch's includes to use that.

My preference would be to try 1. (especially since they seem to have migrated to fmt 12 already; needs inclusion of the respective migrator here), but to limit how much effort we put into this. Basically, as soon as using the native fmt fails in some non-trivial way, go back to vendored fmt (and patch the include paths to use the vendored version)

@mgorny
Copy link
Contributor Author

mgorny commented Oct 28, 2025

Le sigh, looks like more CUDA work needed:

$PREFIX/bin/../lib/gcc/x86_64-conda-linux-gnu/14.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcuda: No such file or directory

I really do wonder why it didn't fail before.

This reverts commit 29b1abe.
FBGEMM doesn't build on Windows anymore.

Signed-off-by: Michał Górny <[email protected]>
This reverts commit 550e5c3.
We'll try system libfmt instead.

Signed-off-by: Michał Górny <[email protected]>
Signed-off-by: Michał Górny <[email protected]>
Signed-off-by: Michał Górny <[email protected]>
…5.10.28.08.08.15

Other tools:
- conda-build 25.9.0
- rattler-build 0.48.1
- rattler-build-conda-compat 1.4.9

Signed-off-by: Michał Górny <[email protected]>
@mgorny
Copy link
Contributor Author

mgorny commented Oct 28, 2025

I'd use some Windows hints here:

 [1/2] Building CXX object CMakeFiles\cmake_test.dir\main.cpp.obj
FAILED: [code=2] CMakeFiles/cmake_test.dir/main.cpp.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1440~1.338\bin\Hostx64\x64\cl.exe  /nologo /TP -DABSL_CONSUME_DLL -DPROTOBUF_USE_DLLS -DPy_NO_LINK_LIB -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -I\include -external:I%PREFIX%\Library\include -external:I%PREFIX%\Library\include\torch\csrc\api\include -external:I%PREFIX%\Include -external:W0 /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /O2 /Ob2 /DNDEBUG -std:c++17 /permissive- /EHsc /bigobj /showIncludes /FoCMakeFiles\cmake_test.dir\main.cpp.obj /FdCMakeFiles\cmake_test.dir\ /FS -c %SRC_DIR%\cmake_test\main.cpp
%PREFIX%\Library\include\fmt\base.h(458): error C2338: static_assert failed: 'Unicode support requires compiling with /utf-8'
ninja: build stopped: subcommand failed.

Is this something to add in our test CMakeLists.txt? Apparently PyTorch does it for their own sources:

https://github.com/pytorch/pytorch/blob/544b443ea1d1a9b19e65f981168a01cb87a2d333/CMakeLists.txt#L862-L865

But that kinda implies all downstream projects will have to do it as well. Not sure if it also happens when using vendored libfmt, but I guess so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PyTorch 2.9.0

4 participants